Data Preprocessing

The incidents_house_price dataset from the previous data cleaning exercise is used in this document.

The number of categories were reduced from 39 to 15 by combining common crime types under the same category. This reduction in detail is to help gain key insights while plotting the data.

Density of crimes by category

The number of crimes for each category are shown on a map of San Francisco (using the leaflet library). This provides a spatial visualization of areas with high concentration of crimes by crime-types. The tool tip indicates the category followed by the number of crimes in parenthesis ().

One observes a higher density of crimes in the north-east part of the map. This region belongs to the SOUTHERN, MISSION, CENTRAL and NORTHERN police districts. One possible explanation for the high number of crimes could be due to the large population density. Also, there is more opportunity due to the large number of tourists in this area. The south-east region has a relatively much lower crime density. Note, however, that there are four hot-spots near the south.

In addition to the plot above, it is also instructive to visualize crimes by category. As an illustration, the plots below show the distribution of crimes for the top 2 (Theft and Arson) and bottom 2 (sexual offenses and Weapon) categories. The high larceny/theft rate in the city area is probably due to the increased use of public transportation which makes it convenient for thieves to target. As mentioned previously, the large number of tourists visiting the area are also easy victims. All other crimes can be explained (as a pedantic excercise) but are not further pursued in this document.

Number of crimes for each category

The bar plot below provides a relative comparison of the number of crimes for each category. Crimes summed over all the years. We observe that from 2003 to 2018, Theft, Arson, Assault and Burglary were (are) major concerns for San Francisco Police Department. The Other Offenses and Non-criminal categories, that contain several smaller numbers of non-violent crimes, also have a significant contribution.

Incidents by month and year

The heatmap below shows the number of crimes by month and year. It allows for easy visualization of hot-spots, i.e., month-year combinations with high crime-rates. One observes that, there is an increase in crimes from 2013 - 2017 compared to the previous years. Crimes are lower during February, November and December. For the winter months of November and December, the lower population density due to holiday/vacation could explain the lower crimes rates. Tourism is also low around this time. In contrast, the summer months of March to October see a relatively larger number of crimes, probably due to the corresponding increased population density and hence opportunity.

Trend of incidents vs time

The number of incidents for the top six most frequent categories occuring during a 24 hour period are shown in the plot below. The Time axis corresponds to the 24-hour clock time. Each datapoint corresponds to the total of all the crimes between the years 2003 - 2018, for the corresponding category and time. One can consider the trends in crimes to be divided in 3 distinct time slots:

  1. 3 a.m - 7 a.m : Has a lower number of crimes compared to other times during the day, most likely because a majority of the people are at home during this time. Therefore, there is less opportunity. After 7 a.m there is a gradual increase in the crime rate.
  2. 10 a.m - 1 p.m: Has a peak at 12 p.m most likely corresponding to lunch break hours for most organisations.
  3. 5 p.m - 12 a.m: Has a peak at 6 p.m probably because of the increase in population due to people returning from work.

For each datapoint in the plot above, the plot below provides the breakdown by year. One observes the following:

  1. After 2005, there has been a drastic decrease in the cumulative count of vehicle thefts throughout the day, from ~18000 to ~8000. Nevertheless, the trend for increased thefts after 3pm (“increased” relative to other times during the 24-hour period) is consistent year over year.
  2. After 2009, drug and alcohol cases have seen a reduction from ~13000 to ~10000 cases, although this decrease has been inconsistent year over year.
  3. There has been a drastic increase in the number of thefts every year since 2011, with apporoximately 5000 cases added every year.

Number of incidents resolved by PD district

The bar plot below shows the number of incidents resolved for each PD district. As seen above in the leaflet map, the SOUTHERN, MISSION, CENTRAL and NORTHERN regions have the largest number of crimes, but the SFPD has not been able to resolve most of the cases. Tenderloin is the only police district where the number of resolved cases exceed the number of unresolved cases.

Crimes vs House Prices

The plot below shows the variation of median house prices and the number of crimes for each PdDistrict. It is to visualize the impact of the crime rate on real estate prices. As can be seen, there is no correlation between the house prices and the corresponding number of crimes. However, there could be a causal relationship, but the data at hand is insufficient to entertain such hypothesis!